Skip to content

Add support for downloading ERA5 pressure levels data product (new)#197

Merged
glwagner merged 62 commits intomainfrom
eq/era5_pressure_levels
Apr 30, 2026
Merged

Add support for downloading ERA5 pressure levels data product (new)#197
glwagner merged 62 commits intomainfrom
eq/era5_pressure_levels

Conversation

@ewquon
Copy link
Copy Markdown
Collaborator

@ewquon ewquon commented Apr 30, 2026

This replaces #93. All previous comments and suggestions have already been addressed — the difference is that this new PR originates from this repo rather than my fork. The reason for opening a new PR is that the CI from my fork was failing due to empty credentials.

Closes #88.

ewquon added 30 commits March 3, 2026 11:08
…ressureLevels

  - Updated exports to include ERA5HourlyPressureLevels, ERA5MonthlyPressureLevels, ERA5_all_pressure_levels, pressure_field
  - Added using Statistics, Oceananigans.Fields.CenterField/interior, Oceananigans.BoundaryConditions.fill_halo_regions!, native_grid, InverseGravity to imports
  - Extended import block to include is_three_dimensional, reversed_vertical_axis, conversion_units
  - Added ERA5_all_pressure_levels constant (37 standard hPa levels)
  - Added ERA5PressureDataset, ERA5HourlyPressureLevels, ERA5MonthlyPressureLevels with keyword constructors
  - Added ERA5PressureMetadata{D} and ERA5PressureMetadatum type aliases
  - Added Base.size, all_dates, is_three_dimensional, reversed_vertical_axis dispatches
  - Added ERA5PL_dataset_variable_names and ERA5PL_netcdf_variable_names dicts (15 variables each)
  - Added available_variables, dataset_variable_name, netcdf_variable_name, conversion_units dispatches
  - Added retrieve_data(::ERA5PressureMetadatum) — reads 4D NetCDF, reverses vertical axis
  - Added metadata_prefix(::ERA5PressureMetadata) — uses ERA5PL_dataset_variable_names for filename construction
  - Added _std_atm_geopotential_height, _std_atm_z_interfaces, z_interfaces(::ERA5PressureMetadata)
  - Added pressure_field and mean_geopotential_heights
Combined CDSAPI requests will return a single combined netcdf; set
cleanup=false to keep the "_tmp_multi_<datetime>_*" files
Note: inpainting is now turned OFF for 2D data -- the reanalysis data
should be complete and turning on inpainting would artificially fill in
data that should be masked (e.g., ocean quantities over land)

Tested for 2D (single levels) and 3D (pressure levels)
to disambiguate from ERA5PressureLevels*
Clipped field will match downloaded bounding box, tested with ERA5
ewquon and others added 5 commits April 29, 2026 13:04
Per @glwagner's review: a `Field{Nothing, Nothing, Center}` represents
the same per-level pressure values as the previous `CenterField` but
without copying across the full horizontal grid. Use `set!` for the
column assignment, drop the per-k loop, and let the field's eltype
follow the grid (no more hardcoded `Float32`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surface-level testsets each downloaded `2m_temperature` then removed it,
forcing the next testset to re-download the same bytes — three CDS
round-trips for one fixture. Keep the pre-clean only in the testset
that's testing the download path itself; let downstream testsets reuse
the file and run cleanup in the last consumer.

Pressure-level "Geopotential height conversion" pre-cleaned the
`geopotential_...nc` that the previous testset's `z_interfaces`
side-effect leaves on disk, then re-downloaded it. Drop the pre-clean
and the redundant explicit `download_dataset(meta_z)` call (Field()
already downloads if needed).

Net: surface round-trips drop 3 → 1, geopotential round-trips drop 2 → 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

- Add `build_era5_area` methods for `Column` (Linear and Nearest) so
  `FieldTimeSeries(Metadata(...; region=Column))` no longer hits MethodError.
  Linear pads ε=0.3° (slightly more than ERA5's 0.25° native spacing) so the
  downloaded file contains the 2x2 stencil bilinear interpolation needs;
  Nearest uses ε=1e-3°.

- Realign `dataset_variable_name(::ERA5*Metadata)` to return the in-file
  short name (e.g. "u") instead of the CDS API catalog name
  ("u_component_of_wind"), matching the docstring ("the name used for the
  variable in its raw dataset file"). The CDS API name is still accessed
  via the `*_dataset_variable_names` dict directly in CDSAPIExt. Drops
  the now-redundant `netcdf_variable_name` methods. This fixes
  `column_field_from_file` for ERA5 — it calls `dataset_variable_name`
  generically and previously got the wrong name back.

- Validate cached file's vertical extent in `column_field_from_file` and
  `mean_geopotential_heights`. A stale cache from a previous run with
  different `pressure_levels` previously produced silent NaN data or a
  cryptic broadcast DimensionMismatch; now throws a clear actionable error.

- Tighten warning text in `z_interfaces` ("Failed to derive geopotential
  heights" rather than "Failed to download") and attach `catch_backtrace`
  so the underlying cause is visible in the warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ewquon and others added 2 commits April 30, 2026 09:08
Two new top-level testsets exercise internals of `NumericalEarthCDSAPIExt`
that were previously uncovered. All tests are hermetic (no CDS API, no
filesystem dependencies beyond `mktempdir`).

- "ERA5 CDSAPIExt dispatch helpers and area construction":
    * `cds_product`, `cds_varnames`, `nc_varnames`, `coord_vars` for
      single-level and pressure-level datasets;
    * `extra_request_keys!` (no-op for single level, populates
      `pressure_level` for pressure-level datasets);
    * `build_era5_area` for `Nothing`, `BoundingBox` (both axes set,
      one axis missing), `Column{Linear}`, `Column{Nearest}`.

- "ERA5 CDSAPIExt NetCDF copy and split helpers":
    * `ncvar_copy!` round-trips data, attributes, and fill values;
    * `ncvar_copy_tslice!` correctly handles both time-dependent and
      time-independent variables;
    * `split_era5_nc` and `split_era5_nc_multistep` produce one output
      file per (variable[, timestep]) request, skipping variables not
      present in the source.

Synthetic ERA5-shaped NetCDFs are written via NCDatasets and discarded
with `mktempdir`; the extension's private symbols are reached through
`Base.get_extension`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread examples/ERA5_bounded_pressure_level_data.jl Outdated
Comment thread examples/ERA5_bounded_pressure_level_data.jl
Comment thread examples/ERA5_winds_and_stokes_drift.jl Outdated
Copy link
Copy Markdown
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we combine the ERA5 data demos into a single example? And add to docs? Generally we should not have examples that aren't in the docs (I believe there are still some orphaned right now, but we need to clean that up)

Comment thread src/DataWrangling/DataWrangling.jl
Comment thread src/DataWrangling/metadata_field.jl Outdated
ewquon and others added 5 commits April 30, 2026 09:57
Cover the parts of `src/DataWrangling/ERA5/ERA5_single_levels.jl`
that the integration tests don't exercise directly: hourly `all_dates`
step (mirrors the existing monthly test), the API/netcdf variable-name
dicts staying in sync, the `available_variables` / `dataset_variable_name`
dispatch (catches the easy swap between API catalog name and netcdf
short name), `default_inpainting` returning `nothing` (the wrong value
here silently makes Field construction expensive), and `metadata_prefix`
filename construction across the single-date / multi-date / no-region
branches plus filename-safety transformations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extract `_group_by_calendar_day(datetimes)` from the inline
comprehensions in two `download_dataset` overloads so the grouping
logic is testable in isolation. Test boundary cases:

- 00:00 belongs to its own day, not the previous one
- multi-day interleaved input
- duplicate datetimes are preserved
- single-element input

Also add tests for the `skip_existing=true` short-circuit in three
multi-file paths (multi-variable pressure-level, single-variable
multi-date, multi-variable multi-date). Pre-create the expected output
files in a tempdir and assert each path returns without invoking
CDSAPI; if the short-circuit ever regresses the test will throw a
credentials/network error and fail loudly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ls, and CDSAPI ext

Cover several pure functions and dispatch overloads that the integration tests
either skip or don't exercise directly. Each test is no-network and self-contained.

`metadata_field.jl`:
- `restrict` (BoundingBox grid construction): identity, half-domain, small bbox,
  off-origin bbox, and the `Nothing` pass-through dispatch.
- `restrict_location` for all three region kinds (`BoundingBox` / `Nothing` /
  `Column`), confirming the Column path reduces horizontal locations to `Nothing`.

`ERA5_pressure_levels.jl`:
- Constructors sort levels descending: pass ASCENDING input
  (`[500, 850]hPa`) so the test fails if `sort(...; rev=true)` regresses to a
  no-op or different order. Covered for both Hourly and Monthly variants.
- `stagger`: pure function that converts ascending centers to Nz+1 staggered
  interfaces. Covers two-element evenly-spaced, three-element evenly-spaced,
  and three-element irregular cases (verifies extrapolation formula at top/bot
  and midpoint formula in the interior).

`NumericalEarthCDSAPIExt.jl`:
- `is_zip`: ZIP magic header detected, non-magic bytes rejected, short
  (<4 byte) files rejected.
- `foreach_nc`: non-zip path calls `f` exactly once with the input path; zip
  path extracts and visits each `.nc`, ignoring non-`.nc` entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/DataWrangling/DataWrangling.jl Outdated
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
@glwagner glwagner merged commit 0d9cfda into main Apr 30, 2026
10 checks passed
@glwagner glwagner deleted the eq/era5_pressure_levels branch April 30, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ERA5 data wranglers only set up for single levels

2 participants